Stereophonic audio signal decompression switching to monaural audio signal

ABSTRACT

A communication system for sending a sequence of symbols on a communication link. The system includes a transmitter for placing information indicative of the sequence of symbols on the communication link and a receiver for receiving the information placed on the communication link by the transmitter. The transmitter includes a clock for defining successive frames, each of the frames including M time intervals, where M is an integer greater than 1. A modulator modulates each of M carrier signals with a signal related to the value of one of the symbols thereby generating a modulated carrier signal corresponding to each of the carrier signals. The modulated carriers are combined into a sum signal which is transmitted on the communication link. The carrier signals include first and second carriers, the first carrier having a different bandwidth than the second carrier. In one embodiment, the modulator includes a tree-structured array of filter banks having M leaf nodes, each of the values related to the symbols forming an input to a corresponding one of the leaf nodes. Each of the nodes includes one of the filter banks. Similarly, the receiver can be constructed of a tree-structured array of sub-band filter banks for converting M time-domain samples received on the communication link to M symbol values.A stereophonic audio signal decompression method that includes decoding, using a decoder, a compressed stereophonic audio signal. A de-quantizer de-quantizes the compressed stereophonic audio signal to generate sets of frequency components for synthesizing left and right audio signals. A controller switches to constructing a single set of frequency components by averaging corresponding frequency components in the left and right audio signals when a computational workload exceeds a capacity of a decompression system and a synthesizer synthesizes a monaural audio time domain signal.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application is a Continuation of U.S. Reissue application Ser. No.10/994,925, now Reissue Pat. No. RE 40,281, which is a Division of U.S.Reissue application Ser. No. 10/603,833 filed Jun. 26, 2003, nowabandoned, which is a Reissue of U.S. application Ser. No. 08/804,909,filed Feb. 25, 1997, now U.S. Pat. No. 6,252,909, issued Jun. 26, 2001.U.S. application Ser. No. 08/804,909, filed Feb. 25, 1997, is aContinuation-in-Part of U.S. patent application Ser. No. 08/307,331,filed Sep. 16, 1994, Pat. No. 5,606,642, which is a division of U.S.Patent Application Ser. No. 07/948,147, filed Sep. 21, 1992, Pat. No.5,408,580.

FIELD OF THE INVENTION

The present invention relates to data transmission systems, and moreparticularly, to an improved multi-carrier transmission system. Thepresent invention further relates to audio compression and decompressionsystems.

BACKGROUND OF THE INVENTION

While digital audio recordings provide many advantages over analogsystems, the data storage requirements for high-fidelity recordings aresubstantial. A high fidelity recording typically requires more than onemillion bits per second of playback time. The total storage needed foreven a short recording is too high for many computer applications. Inaddition, the digital bit rates inherent in non-compressed high fidelityaudio recordings makes the transmission of such audio tracks overlimited bandwidth transmission systems difficult. Hence, systems forcompressing audio sound tracks to reduce the storage and bandwidthrequirements are in great demand.

One class of prior an audio compression systems divide the sound trackinto a series of segments. Over the time interval represented by eachsegment, the sound track is analyzed to determine the signal componentsin each of a plurality of frequency bands. The measured components arethen replaced by approximations requiring fewer bits to represent, butwhich preserve features of the sound track that are important to a humanlistener. At the receiver, an approximation to the original sound trackis generated by reversing the analysis process with the approximationsin place of the original signal components.

The analysis and synthesis operations are normally carried out with theaid of perfect, or near perfect, reconstruction filter banks. Thesystems in question include an analysis filter bank which generates aset of decimated subband outputs from a segment of the sound track. Eachdecimated subband output represents the signal in a predeterminedfrequency range. The inverse operation is carried out by a synthesisfilter bank which accepts a set of decimated subband outputs andgenerates therefrom a segment of audio sound track. In practice, thesynthesis and analysis filter banks are implemented on digital computerswhich may be general purpose computers or special computers designed tomore efficiently carry out the operations. If the analysis and synthesisoperations are carried out with sufficient precision, the segment ofaudio sound track generated by the synthesis filter bank will match theoriginal segment of audio sound track that was inputted to the analysisfilter bank. The differences between the reconstructed audio sound trackand the original sound track can be made arbitrarily small. In thiscase, the specific filter bank characteristics such as the length of thesegment analyzed, the number of filters in the filter bank, and thelocation and shape of filter response characteristics would be of littleinterest, since any set of filter banks satisfying the perfect, ornear-perfect, reconstruction condition would exactly regenerate theaudio segment.

Unfortunately, the replacement of the frequency components generated bythe analysis filter bank with a quantized approximation thereto resultsin artifacts that do depend on the detail characteristics of the filterbanks. There is no single segment length for which the artifacts in thereconstructed audio track can be minimized. Hence, the length of thesegments analyzed in prior art systems is chosen to be a compromise.When the frequency components are replaced by approximations, an erroris introduced in each component. An error in a given frequency componentproduces an acoustical effect which is equivalent to the introduction ofa noise signal with frequency characteristics that depend on filtercharacteristics of the corresponding filter in the filter bank. Thenoise signal will be present over the entire segment of thereconstructed sound track. Hence, the length of the segments isreflected in the types of artifacts introduced by the approximations. Ifthe segment is short, the artifacts are less noticeable. Hence, shortsegments are preferred. However, if the segment is too short, there isinsufficient spectral resolution to acquire information needed toproperly determine the minimum number of bits needed to represent eachfrequency component. On the other hand, if the segment is too long,temporal resolution of the human auditory system will detect artifacts.

Prior art systems also utilize filter banks in which the frequency bandsare uniform in size. Systems with a few (16-32) sub-bands in a 0-22 kHzfrequency range are generally called “subband coders” while those with alarge number of sub-bands (.gtoreq.64) are called “transform coders”. Itis known from psychophysical studies of the human auditory system thatthere are critical bandwidths which vary with frequency. The informationin a critical band may be approximated by a component representing thetime averaged signal amplitude in the critical band.

In addition, the ear's sensitivity to a noise source in the presence ofa localized frequency component such as a sine tone depends on therelative levels of the signals and on the relation of the noise spectralcomponents to the tone. The errors introduced by approximating thefrequency components may be viewed as “noise”. The noise becomessignificantly less audible if its spectral energy is within one criticalbandwidth of the tone. Hence, it is advantageous to use frequencydecompositions which approximate the critical band structure of theauditory system.

Systems which utilize uniform frequency bands are poorly suited forsystems designed to take advantage of this type of approximation. Inprinciple, each audio segment can be analyzed to generate a large numberof uniform frequency bands, and then, several bands at the higherfrequencies could be merged to provide a decomposition into criticalbands. This approach imposes the same temporal constraints on allfrequency bands. That is, the time window over which the low frequencydata is generated for each band is the same as the time window overwhich each high-frequency band is generated. To provide accuracy in thelow frequency ranges, the time window must be very long. This leads totemporal artifacts that become audible at higher frequencies. Hence,systems in which the audio segment is decomposed into uniform sub-bandswith adequate low-frequency resolution cannot take full advantage of thecritical band properties of the auditory system.

Prior art systems that recognize this limitation have attempted to solvethe problem by utilizing analysis and synthesis filter banks based onQMF filter banks that analyze a segment of an audio sound track togenerate frequency components in two frequency bands. To obtain adecomposition of the segment into frequency components representing theamplitudes of the signal in critical bands, these two frequency band QMFfilters are arranged in a tree-structured configuration. That is, eachof the outputs of the first level filter becomes the input to anotherfilter bank at least one of whose two outputs is fed to yet anotherlevel, and so on. The leaf nodes of this tree provide an approximationto a critical band analysis of the input audio track. It can be shownthat this type of filter bank used different length audio segments togenerate the different frequency components. That is, a low frequencycomponent represents the signal amplitude in an audio segment that ismuch longer than a high-frequency component. Hence, the need to choose asingle compromise audio segment length is eliminated.

While tree structured filter banks having many layers may be used todecompose the frequency spectrum into critical bands, such filter banksintroduce significant aliasing artifacts that limit their utility. In amultilevel filter bank, the aliasing artifacts are expected to increaseexponentially with the number of levels. Hence, filter banes with largenumbers of levels are to be avoided. Unfortunately, filter banks basedon QMF filters which divide the signal into two bandlimited signalsrequire large numbers of levels.

Prior art audio compression systems are also poorly suited toapplications in which the playback of the material is to be carried outon a digital computer. The use of audio for computer applications isincreasingly in demand. Audio is being integrated into multimediaapplications such as computer based entertainment, training, anddemonstration systems. Over the course of the next few years, many newpersonal computers will be outfitted with audio playback and recordingcapability. In addition, existing computers will be upgraded for audiowith the addition of plug-in peripherals.

Computer based audio and video systems have been limited to the use ofcostly outboard equipment such as an analog laser disc player forplayback of audio and video. This has limited the usefulness andapplicability of such systems. With such systems it is necessary toprovide a user with a highly specialized playback configuration, andthere is no possibility of distributing the media electronically.However, personal computer based systems using compressed audio andvideo data promise to provide inexpensive playback solutions and allowdistribution of program material on digital disks or over a computernetwork.

Until recently, the use of high quality audio on computer platforms hasbeen limited due to the enormous data rate required tier storage andplayback. Quality has been compromised in order to store the audio dataconveniently on disk. Although some increase in performance and somereduction in bandwidth has been gained using conventional audiocompression methods, these improvements have not been sufficient toallow playback of high fidelity recordings on the commonly used computerplatforms without the addition of expensive special purpose hardware.

One solution to this problem-would be to use lower quality playback oncomputer platforms that lack the computational resources to decodecompressed audio material at high fidelity quality levels.Unfortunately, this solution requires that the audio material be codedat various quality levels. Hence, each audio program would need to bestored in a plurality of formats. Different types of users would then besent the format suited to their application. The cost and complexity ofmaintaining such multi-format libraries makes this solutionunattractive. In addition, the storage requirements of the multipleformats partially defeats the basic goal of reducing the amount ofstorage needed to store the audio material.

Furthermore, the above discussion assumes that the computationalresources of a particular playback platform are fixed. This assumptionis not always true in practice. The computational resources of acomputing system are often shared among a plurality of applications thatare running in a time-shared environment. Similarly, communication linksbetween the playback platform and shared storage facilities also may beshared. As the playback resources change, the format of the audiomaterial must change in systems utilizing a multi-format compressionapproach. This problem has not been adequately solved in prior artsystems.

In prior art multi-carrier systems, a communication path having a fixedbandwidth is divided into a number of sub-bands having differentfrequencies. The width of the sub-bands is chosen to be the same for allsub-bands and small enough to allow the distortion in each sub-band tobe modeled by a single attenuation and phase shift for the band. If thenoise level in each band is known, the volume of data sent in each bandmay be maximized for any given bit error rate by choosing a symbol setfor each channel having the maximum number of symbols consistent withthe available signal-to-noise ratio of the channel. By using eachsub-band at its maximum capacity, the amount of data that can betransmitted in the communication path for a given error rate ismaximized.

For example, consider a system in which one of the sub-channels has asignal-to-noise ratio which allows at least 16 digital levels to bedistinguished from one another with an acceptable error rate. In thiscase, a symbol set having 16 possible signal values is chosen. If theincoming data stream is binary, each consecutive group of 4 bits is usedto compute the corresponding symbol value which is then sent on thecommunication channel in the sub-band in question.

In digitally implemented multi-carrier systems, the actual synthesis ofthe signal representing the sum of the various modulated carriers iscarried out via a mathematical transformation that generates a sequenceof numbers that represents the amplitude of the signal as function oftime. For example, a sum signal may be generated by applying an inverseFourier transformation to a data vector generated from the symbols to betransmitted in the next time interval. Similarly, the symbols arerecovered at the receiver using the corresponding inversetransformation.

The computational workload inherent in synthesizing and analyzing themulti-carrier signal is related to the number of sub-bands. For example,if Fourier transforms are utilized, the workload is of order NlogN whereN is the number of sub-bands. Similar relationships exist for othertransforms. Hence, it is advantageous to minimize the number ofsub-bands.

There are two factors that determine the number of sub-bands in priorart systems. First, the prior art systems utilize a uniform bandwidth.Hence, the number of sub-bands is at least as great as the totalbandwidth available for transmission divided by the bandwidth of thesmallest sub-band. The size of the smallest sub-band is determined byneed to characterize each channel by a single attenuation and phaseshift. Thus, the sub-band having the most rapidly varying distortionsets the number of sub-bands and the computational workload in the casein which white noise is the primary contributor to the signal-to-noiseratio.

In systems in which the major source of interference is narrow bandinterference, the minimum sub-band is set with reference to thenarrowest sub-band that must be removed from the communication channelto avoid the interference. Consider a communication channel consistingof a twisted pair of wires which is operated at a total communicationband which overlaps with the AM broadcast band in frequency. Because ofthe imperfect shielding of the wires, interference from strong radiostations will be picked up by the twisted pair. Hence, the sub-bandsthat correspond to these radio signals are not usable. In this case,prior art systems break the communication band into a series of uniformsub-bands in which certain sub-bands are not used. Ideally, thesub-bands are sufficiently narrow that only the portion of the spectrumthat is blocked by a radio signal is lost when a sub-band is marked asbeing unusable.

Broadly, it is the object of the present invention to provide animproved multi-carrier transmission system.

It is a further object of the present invention to provide amulti-carrier transmission system having a lower computational workloadthan imposed by systems having bands of equal band-width.

These and other objects of the present invention will become apparent tothose skilled in the art from the following detailed description of theinvention and the accompanying drawings.

SUMMARY OF THE INVENTION

The present invention comprises audio compression and decompressionsystems. An audio compression system according to the present inventionconverts an audio signal into a series of sets of frequency components.Each frequency component represents an approximation to the audio signalin a corresponding frequency band over a time interval that depends onthe frequency band. The received audio signal is analyzed in atree-structured sub-band analysis filter. The sub-band analysis filterbank comprises a tree-structured array of sub-band filters, the audiosignal forming the input of the root node of the tree-structured arrayand the frequency components being generated at the leaf nodes of thetree-structured array. Each of the sub-band filter banks comprises aplurality of FIR filters having a common input for receiving an inputaudio signal. Each filter generates an output signal representing theinput audio signal in a corresponding frequency band, the number of FIRfilters in at least one of the sub-band filter bank is greater than two,and the number of said FIR filters in at least one of the sub-bandfilters is different than the number of FIR filters in another of thesub-band filters. The frequency components generated by the sub-bandanalysis filter are then quantized using information about the maskingfeatures of the human auditory system.

A decompression system according to the present invention regenerates atime-domain audio signal from the sets of frequency components such asthose generated by a compression system according to the presentinvention. The decompression system receives a compressed audio signalcomprising sets of frequency components, the number of frequencycomponents in each set being M. The decompression apparatus synthesizesM time domain audio signal values from each of the received set offrequency components. The synthesis sub-system generates 2M polyphasecomponents from the set of frequency components. Then it generates a Wentry array from the polyphase phase components and multiples each entryin the array by a corresponding weight value derived from a prototypefilter. The time domain audio samples are then generated from theweighted array. The generated samples are stored in a FIFO buffer andoutputted to a D/A converter. The FIFO buffer generates a signalindicative of the number of time domain audio signal values storedtherein. The rate at which these sample values are outputted to the D/Aconverters is determined by clock. The preferred embodiment of thedecompression system includes a controller that uses the level indicatorin the FIFO buffer or other operating system loading parameter to adjustthe computational complexity of the algorithm used to synthesize thetime domain samples. When the level indicator indicates that the numberof time domain samples stored in the FIFO buffer is less than a firstpredetermined value, the normal synthesis operation is replaced by onethat generates an approximation to the time domain samples. Thisapproximation requires a smaller number of computations than would berequired to generate the time domain audio signal values. Theapproximation may be generated by substituting a truncated or shorterprototype filter or by eliminating the contributions of selectedfrequency components from the computation of the polyphase components.In stereophonic systems, the controller may also switch the synthesissystem to a monaural mode based on average frequency components whichare obtained by averaging corresponding frequency components for theleft and right channels.

The present invention is a also directed toward communication system forsending a sequence of symbols on a communication link. The systemincludes a transmitter for placing information indicative of thesequence of symbols on the communication link and a receiver forreceiving the information placed on the communication link by thetransmitter. The transmitter includes a clock for defining successiveframes, each of the frames including M time intervals, where M is aninteger greater than 1. A modulator modulates each of the M carriersignals with a signal related to the value of one of the symbols therebygenerating a modulated carrier signal corresponding to each of thecarrier signals. The modulated carriers are combined into a sum signalwhich is transmitted on the communication link. The carrier signalsinclude first and second carriers, the first carrier having a differentbandwidth than the second carrier. In one embodiment, the modulatorincludes a tree-structured array of filter banks having M leaf nodes,each of the values related to the symbols forming an input to acorresponding one of the leaf nodes. Each of the nodes includes one ofthe filter banks. Similarly, the receiver can be constructed of atree-structured array of sub-band filter banks for converting Mtime-domain samples received on the communication link to M symbolvalues.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a block diagram of a typical prior art multicarriertransceiver.

FIG. 2 is a block diagram of a filter bank for performing thetime-domain to frequency-domain transformation utilized by the presentinvention.

FIG. 3 is a block diagram of a filter bank for performing thefrequency-domain to time-domain transformation utilized by the presentinvention.

FIG. 4 is a block diagram of an audio compression system.

FIG. 5 is a block diagram of a sub-band decomposition filter accordingto the present invention.

FIG. 6 illustrate the relationship between the length of the segment ofthe original audio signal used to generate the frequency of eachsub-band and the bandwidth of each band.

FIG. 7 illustrates the relationship between successive overlappingsegments of an audio signal.

FIG. 8(a) is a block diagram of an audio filter based on a low-frequencyfilter and a modulator.

FIG. 8(b) is a block diagram of a sub-band analysis filter forgenerating a set of frequency components.

FIG. 9 illustrates the manner in which a sub-band analysis filter may beutilized to obtain the frequency information needed forpsycho-acoustical analysis of the audio signal prior to quantization.

FIG. 10 is a block diagram of an audio decompression system fordecompressing the compressed audio signals generated by a compressionsystem.

FIG. 11 is a block diagram of a synthesizer according to the presentinvention.

FIG. 12 is a block diagram of an audio decompression system utilizingthe variable computational load techniques of the present invention.

FIG. 13 is a block diagram of a stereophonic decompression systemaccording to the present invention.

FIG. 14 is a block diagram of a stereophonic decompression systemaccording to the present invention using a serial computation system.

FIG. 15 is a block diagram of an audio compression apparatus utilizingvariable computational complexity.

FIG. 4 16 is a schematic view of a second embodiment of a synthesisfilter bank that may be used with the present invention to generate afrequency-domain to time-domain transformation.

FIG. 5 17 is a schematic view of a second embodiment of an analysisfilter bank that may be used with the present invention to generate atime-domain to frequency-domain transformation.

DETAILED DESCRIPTION OF THE INVENTION

The manner in which the present invention operates can be more easilyunderstood with reference to FIG. 1 which is a block diagram of atypical prior art multi-carrier transceiver 100. Transceiver 100transmits data on a communication link 113. The input data stream isreceived by a symbol generator 102 which converts a run of data bitsfrom the input stream into M symbols S₁, S₂, . . . , S_(M) which arestored in a register 104. The number of possible states for each symbolwill depend on the noise levels in the corresponding frequency band onthe transmission channel 113 and on the error rate that can be toleratedby the data. For the purposes of the present discussion, it issufficient to note that each symbol is a number whose absolute value mayvary from 0 to some predetermined upper bound. For example, if a symbolhas 16 possible values, this symbol can be used to represent 4 bits inthe input data stream.

Transceiver 100 treats the symbols S_(i) as if they were the amplitudeof a signal in a narrow frequency band. Frequency to time-domaintransform circuit 106 generates a time-domain signal X_(i), for i from 0to M−1, that has the frequency components S_(i). The time-domain signalsare stored in a shift register 108. The contents of shift register 108represent, in digital form, the next segment of the signal that is to beactually transmitted over communication link 113. The actualtransmission is accomplished by clocking the digital values ontotransmission link 113 (possibly after upconversion to radio frequencies)after converting the values to analog voltages using D/A converter 110.Clock 107 provides the timing pulses for the operation. The output ofD/A converter 110 is low-pass filtered by filter 112 before being placedon communication link 113.

At the receiving end of transmission link 113, the transmission segmentis recovered. The signals received on communication link 113 arelow-pass filtered to reduce the effects of high frequency noisetransients. The signals are then digitized and shifted into a register118. When M values have been shifted into register 118, the contentsthereof are converted via a time-domain to frequency-domain transformcircuit 120 to generate a set of frequency-domain symbols S′_(i). Thistransformation is the inverse of the transformation generated byfrequency to time-domain transform 106. It should be noted thatcommunication link 113 will, in general, both attenuate and phase shiftthe signal represented by the X_(i). Hence, the signal values receivedat low-pass filter 114 and A/D converter 116 will differ from theoriginal signal values. Thus, the contents of shift register 118 willnot match the corresponding values from shift register 108. For thisreason, the contents of shift register 118 are denoted by X′_(i).Similarly, the output of the time to frequency-domain transform willalso differ from the original symbols S_(i); hence, the contents ofregister 122 are denoted by S′_(i). Equalizer 124 corrects the S′_(i)for the attenuation and phase shift resulting from transmission overcommunication link 113 to recover the original symbols which are storedin buffer 126. In addition, equalizer 124 corrects the symbols forintersymbol interference arising from synchronization errors between thetransmitter and receiver. Finally, the contents of buffer 126 aredecoded to regenerate the original data stream by symbol decoder 128.

For efficient design of the equalizer 124 in FIG. 1, each subchannelmust be sufficiently narrow to allow the distortions in that subchannelto be modeled by a single phase shift and attenuation. Sub-channels mustalso be sufficiently narrow to assure that a sub-channel that is turnedoff to prevent interference from narrow band sources does not undulywaste bandwith beyond that corrupted by the interference source.However, using narrower channels across the transmission band increasesboth system latency and the computational complexity of thefrequency-domain-to-time-domain transformation and its inverse. Thepresent invention is based on the observation that the variation in theattenuation and phase shift as a function of frequency is greater at lowfrequencies than at higher frequencies for communication linksconsisting of twisted pairs or coaxial cable. Thus, it is advantageousfrom a computational complexity viewpoint to employ narrower subchannelsat the low frequencies and wider subchannels at the higher frequenciesin a multicarrier modulation system.

To implement a variable channel width system, a transformation thatbreaks the available frequency band into sub-bands of differing width isrequired. Such a transformation may be constructed from a treeconfigured filter bank discussed hereinafter. Tree configured filtersare known in the audio compression arts. For example, U.S. Pat. No.5,408,580, which is hereby incorporated by reference, describes theanalysis of Specifically, the transformation splits an audio signal intofrequency components representing the audio signal in differentfrequency bands utilizing such a filter. The frequency bands vary inwidth such that the lower frequency bands are divided into smaller bandsthan the higher frequency bands.

Refer now to FIG. 2 which illustrates the decomposition of a signal intofrequency sub-bands by a tree structured filter 30. Such a filter couldbe utilized to implement the time-domain to frequency-domaintransformation 120 shown in FIG. 1. Filter 30 includes two levels offilter banks. The manner in which the filter banks are constructed willbe discussed in more detail below. In the example shown in FIG. 2, 22sub-bands are utilized. The decomposition is carried out in two levelsof filters. The first level of filter 30 consists of a filter bank 31which divides the input signal into eight sub-bands of equal size. Thesecond level subdivides the lowest three frequency bands from filterbank 31 into finer subdivisions. The second level consists of threefilter banks 32-34. Filter bank 32 divides the lowest sub-band fromfilter bank 31 into 8 equal sub-bands. Filter bank 33 and filter bank 34divide the second and third sub-bands created by filter bank 31 intofive and four sub-bands respectively. The combination of the two levelsgenerates 22 frequency sub-bands. When applying the tree-structuredfilter bank to multicarrier communications, the analysis filter bank isused to demodulate the received signal. The filter bank performs atime-domain to frequency-domain transformation, converting receivedsignal amplitudes into demodulated symbols for subsequent equalization.

The reverse transformation can performed by an analogous filter banksuch as shown in FIG. 3 at 60. Filter 60 provides the frequency-domainto time-domain transformation shown in FIG. 1. The reversetransformation also utilizes a two level tree structure. The symbols tobe sent on the finer sub-bands are first combined using a first set ofsynthesis filters shown at 62-64 to provide signals representing threelarger sub-bands of the same width as bands 18-22. These “symbols”together with those from bands 18-22 are then combined by synthesisfilter 61 to provide the time-domain output signal that is sent on thecommunication link.

The manner in which the individual filters are constructed is explainedin detail in U.S. Pat. No. 5,408,580, and hence will not be discussed indetail here.The manner in which the present invention obtains itsadvantages over prior art audio compression systems may be more easilyunderstood with reference to the manner in which a conventional audiocompression system operates. FIG. 4 is a block diagram of an audiocompression system 10 using a conventional sub-band analysis system. Theaudio compression system accepts an input signal 11 which is dividedinto a plurality of segments 19. Each segment is analyzed by a filterbank 12 which provides the frequency components for the segment. Eachfrequency component is a time average of the amplitude of the signal ina corresponding frequency band. The time average is, in general, aweighted average. The frequencies of the sub-bands are uniformlydistributed between a minimum and maximum value which depend on thenumber of samples in each segment 19 and the rate at which samples aretaken. The input signal is preferably digital in nature; however, itwill be apparent to those skilled in the art that an analog signal maybe used by including an analog-to-digital converter prior to filter bank12.

The component waveforms generated by filter bank 12 are replaced bydigital approximations by quantizer 14. The number of bits assigned toeach amplitude is determined by a psycho-acoustic analyzer 16 whichutilizes information about the auditory system to minimize thedistortions introduced by the quantization. The quantized frequencycomponents are then further coded by coder 18 which makes use of theredundancy in the quantized components to further reduce the number ofbits needed to represent the coded coefficients. Coder 18 does notintroduce further errors into the frequency components. Codingalgorithms are well known to those skilled in the signal compressionarts, and hence, will not be discussed in more detail here.

The quantization process introduces errors into the frequencycoefficients. A quantization scheme replaces the amplitude of eachfrequency component by an integer having a finite precision. The numberof bits used to represent the integers will be denoted by P. Theintegers in question are then transmitted in place of the individualfrequency components. At the receiver, the inverse of the mapping usedto assign the integer values to the frequency components is used toproduce amplitudes that are used in place of the original amplitudes forthe frequency components. There are at most 2^(P) distinct values thatcan be represented; hence, if there are more than 2^(P) differentfrequency component values, at least some of the frequency componentswill not be exactly recovered. The goal of the quantization algorithm isto minimize the overall effect of the quantization errors on thelistener.

The errors introduced by the quantization algorithm affect thereconstructed audio track for a time period equal to the length of thesegment analyzed to calculate the frequency components. The artifactsintroduced by these errors are particularly noticeable in regions of theaudio track in which the sound increases or decreases in amplitude overa period of time which is short compared to the length of the segmentsbeing analyzed. Because of the rapid rise, the set of frequencycomponents of audio track in the segment will have a number ofhigh-frequency components of significant amplitude which are not presentin the segments on either side of the segment in question. Consider aquantization error in one of these high-frequency components. The erroris equivalent to adding noise to the original signal. The amplitude ofthe noise will be determined by the quantization error. This noise willbe present for the entire length of the segment in the reconstructedaudio track. The noise resulting from the quantization error commencesat the boundary of the segment even though the attack begins in themiddle of the segment. The amplitude of the noise in the early part ofsegment may be of the same order of magnitude as the reconstructed audiotrack; hence, the noise will be particularly noticeable. Since the noiseprecedes the actual rise in intensity of the audio track, it isperceived as a “pre-echo”. If the segment duration is long compared tothe rise time of the audio signal, the pre-echo is particularlynoticeable. Hence, it would be advantageous to choose filter bands inwhich the high-frequency components are calculated from segments thatare shorter than those used to calculate the low-frequency components.This arrangement avoids the situation in which the segment used tocompute high-frequency components is long compared to the rate of changeof the component being computed.

Low bit rate audio compression systems operate by distributing the noiseintroduced by quantization so that it is masked by the signal. The ear'ssensitivity to a noise source in the presence of a localized frequencycomponent such as a sine tone depends on the relative levels of thesignals and on the relation of the noise spectral components to thetone. The noise becomes significantly less audible if its spectralenergy is within one critical bandwidth of the tone. Hence, it would beadvantageous to choose filter bands that more closely match the criticalbands of the human auditory system.

The present invention utilizes a filter bank in which differentfrequency bands utilize different segment lengths. In prior art systems,each segment is analyzed in a bank of finite impulse response filters.The number of samples in the input segment over which each frequencycomponent is computed is the same. The present invention uses differentwidth segments the different frequency components. Ideally, an audiodecomposition should exhibit a time and frequency dependency similar tothat of human hearing. This may be accomplished by relating thefrequency divisions or sub-bands of the decomposition to the criticalbandwidths of human hearing. The resulting decomposition has finefrequency resolution with relatively poor temporal resolution at lowfrequencies, and coarse frequency resolution with fine temporalresolution at high frequencies. As a result, the segment lengthcorresponding to high-frequency components does not greatly exceed therise time of attacks in the audio track. This reduces the pre-echoartifacts discussed above.

In one embodiment of the present invention, a tree structureddecomposition which approximates the ear's time and frequencysensitivity is utilized. This filter may be used to replace sub-bandanalysis filter bank 12 shown in FIG. 4. A block diagram of a sub-banddecomposition filter for carrying out this decomposition is shown at 30in FIG. 5. Filter 30 includes two levels of filter banks. The manner inwhich the filter banks are constructed will be discussed in more detailbelow. For the purposes of the present discussion, it is important tonote that the decomposition is carried out with only two levels offilters, and hence, avoids the aliasing problems inherent in QMF filterbanks that require many levels. The aliasing problems encountered withQMF filter banks become significant when the number of levels exceeds 4.

The first level of filter 30 consists of a filter bank 31 which dividesthe input signal into eight sub-bands of equal size. The second levelsub-divides the lowest three frequency bands from filter bank 31 intofiner sub-divisions. The second level consists of three filter banks32-34. Filter bank 32 divides the lowest sub-band from filter bank 31into 8 equal sub-bands. Filter bank 33 and filter bank 34 divide thesecond and third sub-bands created by filter bank 31 into foursub-bands. The combination of the two levels generates 21 frequencysub-bands. The relationship between the length of the segment of theoriginal audio signal used to generate the frequency and phase of eachsub-band and the bandwidth of each band is shown schematically in FIG.6. The lower frequencies, bands 1-8, have the finest frequencyresolution, but the poorest temporal resolution. The highestfrequencies, bands 17-21, have the poorest frequency resolution, but thefinest time resolution. This arrangement more nearly approximates theear's sensitivity than systems utilizing filter banks in which all bandshave the same temporal resolution, while avoiding the aliasing problemsinherent in tree-structured filters having many levels of filters.

While quantization errors in each of the amplitudes still introducesnoise, the noise spectrum obtained with this embodiment of the presentinvention is less objectionable to a human listener than that obtainedwith prior art systems. As noted above, prior art systems tend to have anoise spectrum which changes abruptly at the segment boundaries. In thepresent invention, the amplitude of the quantization noise can switchmore rapidly at higher frequencies. If the length of the low frequencysegments is denoted by T, then the medium frequencies are measured onsegments that are T/2, and the highest frequencies are measured onsegments that are T/8 in length. The quantization noise is the sum ofall of the quantization errors in all of the frequency bands. As aresult, the quantization noise changes every T/8. To obtain the sameresolution in the low frequency components, a conventional filter systemwould measure all of the frequency components on segments of length T.Hence, the prior art would introduce quantization noise which changesabruptly every T samples. The present invention introduces a moregradual change in the noise level in the T/8 interval for the high andmedium sub-bands thus giving less perceptible distortion at higherfrequencies.

The manner in which the input signal is divided into segments can effectthe quality of the regenerated audio signal. Consider the case in whichthe signal is analyzed on segments that do not overlap. This analysis isequivalent to employing a model in which the regenerated signal isproduced by summing the signals of a number of harmonic oscillatorswhose amplitudes remain constant over the duration of the segment onwhich each amplitude was calculated. In general, this model is a poorapproximation to an actual audio track. In general, the amplitudes ofthe various frequency components would be expected to change over theduration of the segments in question. Models that do not take thischange into account will have significantly greater distortions thanmodels in which the amplitudes can change over the duration of thesegment, since there will be abrupt changes in the amplitudes of thefrequency components at each segment boundary.

One method for reducing the discontinuities in the frequency componentamplitudes at the segment boundaries is to employ a sub-band analysisfilter that utilizes overlapping segments to generate successivefrequency component amplitudes. The relationship of the segments isshown in FIG. 7 for a signal 301. The sub-band analysis filter generatesM frequency components for signal 301 for each M signal values. However,each frequency component is generated over a segment having a durationmuch greater than M. Each component is generated over a segment having alength of W sample values, where W>M. Typical segments are shown at 312and 313. It should be noted that successive segments overlap by (W-M)samples.

In the preferred embodiment of the present invention, the variousfrequency bands in a sub-band analysis filter bank have the same shapebut are shifted relative to one another. This arrangement guaranteesthat all frequency bands have the same aliasing properties. Such afilter bank can be constructed from a single low frequency band passfilter having the desired band shape. The manner in which the variousfilter bands are constructed is most easily understood with reference toFIG. 8(a) which is a block diagram of a single filter constructed from alow-frequency bandpass filter 377 and a mixer 376. Assume that thelow-pass filter 377 has a center frequency of Fc and that the desiredcenter frequency of filter 350 is to be F. Then by shifting the inputaudio signal by a frequency of F-Fc prior to analyzing the signal withlow-frequency bandpass filter 377, the output of low-frequency bandpassfilter 377 will be the amplitude of the audio signal in a band having acenter frequency of F. Modulator 376 accomplishes this frequency shift.

A filter bank can then be constructed from a single prototypelow-frequency bandpass filter by using different modulation frequenciesto shift the incoming audio signal prior to analysis by the prototypefilter. While such a filter bank can be constructed from analog circuitcomponents, it is difficult to obtain filter performance of the typeneeded. Hence, the preferred embodiment of the present inventionutilizes digital filter techniques.

A block diagram of a sub-band analysis filter 350 for generating a setof M frequency components, S_(i), from a W sample window is shown inFIG. 8(b). The M audio samples are clocked into a W-sample shiftregister 320 by controller 325. The oldest M samples in shift register320 are shifted out the end of the shift register and discarded. Thecontents of the shift register are then used to generate 2M polyphasecomponents P_(k), for k=0 to 2M−1. The polyphase components aregenerated by a windowing operation followed by partial summation. Thewindowing operation generates a W-component array Z_(i) from thecontents of shift register 320 by multiplying each entry in the shiftregister by a corresponding weight, i.e.,Z_(i)=h_(i)*x_(i)   (1)where the x_(i), for i=0 . . . W−1 are the values stored in shiftregister 320, and the h_(i) are coefficients of a low pass prototypefilter which are stored in controller 325. For those wishing a moredetailed explanation of the process for generating sets of filtercoefficients, see J. Rothweiler, “POLYPHASE QUADRATURE FILTERS—A NEWSUB-BAND CODING TECHNIQUE” IEEE Proceedings of the 1983 ICASSPConference, pp 1280-1283. The polyphase components are then generatedfrom the Z_(i) by the following summing operations:

$\begin{matrix}{P_{k} = {\sum\limits_{j = 0}^{2M}\; Z_{i + {2{Mj}}}}} & (2)\end{matrix}$

The frequency components, S_(i), are obtained via the following matrixmultiplication from the polyphase components

$\begin{matrix}{S_{i} = {\sum\limits_{k = 0}^{{2M} - 1}\;{P_{k}{co}\;{s\left\lbrack \frac{\left( {{2i} + 1} \right)\left( {k - {M\text{/}2}} \right)\pi}{2M} \right\rbrack}}}} & \left( {3a} \right)\end{matrix}$This operation is equivalent to passing the polyphase components throughM finite impulse response filters of length 2M. The cosine modulation ofthe polyphase components shown in Eq. (3a) may be replaced by other suchmodulation terms. The form shown in Eq. (8a) leads to near-perfectreconstruction. An alternative modulation scheme which allows forperfect reconstruction is as follows:

$\begin{matrix}{S_{i} = {\sum\limits_{k = 0}^{{2M} - 1}\;{P_{k}{co}\;{s\left\lbrack \frac{\left( {{2i} + 1} \right)\left( {k + 1 - {M\text{/}2}} \right)\pi}{4M} \right\rbrack}}}} & \left( {3b} \right)\end{matrix}$It can be seen by comparison to FIG. 5(a) that the matrix multiplicationprovides an operation analogous to the modulation of the incoming audiosignal. The windowing operation performs the analysis with the prototypelow-frequency filter.

As will be discussed in more detail below, the computational workload inanalyzing and synthesizing audio tracks, of a great importance inproviding systems that can operate on general purpose computingplatforms. It will be apparent from the above discussion that thecomputational workload inherent in generating M frequency componentsfrom a window of W audio sample values is approximately (W+2M²)multiplies and adds. In this regard, it should be noted that a two levelfilter bank of the type used in the present invention significantlyreduces the overall computational workload even in situations in whichthe frequency spectrum is to be divided into uniform bands. For example,consider a system in which the frequency spectrum is to be divided into64 bands utilizing a window of 512 samples. If a prior art one levelfilter bank is utilized, the workload will be approximately 8,704multiplies and adds. If the filter bank is replaced by a two levelfilter bank according to the present invention, then the filter bankwill consist of 9 filter banks, each dividing the frequency spectruminto 8 bands. The computational workload inherent in this arrangement isonly 5,760 multiplies and adds. Hence, a filter bank according to thepresent invention typically requires less computational capability thana one level filter bank according to the prior art. In addition, afilter bank according to the present invention also provides a means forproviding a non-uniform band structure.

The transformation of the audio signal into sets of frequency componentsas described above does not, in itself, result in a decrease in thenumber of bits needed to represent the audio signal. For each M audiosamples received by a sub-band analysis filter, M frequency componentsare generated. The actual signal compression results from thequantization of the frequency components. As noted above, the number ofbits that must be allocated to each frequency component is determined bya phenomena known as “masking”. Consider a tone at a frequency f. Theability of the ear to detect a signal at frequency f′ depends on theenergy in the tone and difference in frequency between the signal andthe tone, i.e., (f-f′). Research in human hearing has led tomeasurements of a threshold function T(E,f,f′) which measures theminimum energy at which the second frequency component can be detectedin the presence of the first frequency component with energy E. Ingeneral, the threshold function will vary in shape with frequency.

The threshold function is used to construct a masking function asfollows. Consider a segment of the incoming audio signal. Denote theenergy as a function of frequency in this segment by E(t). Then a masklevel, L(f), is constructed by convolving E(f) and T(f,f′), i.e.,L(f)=∫T(E(f′)f,f′)E(f′)df   (4)Consider the filtered signal value in a band f_(o)±Δf . Denote theminimum value of L in this frequency band by L_(min). It should be notedthat L_(min) may depend on frequency components outside the band inquestion, since a peak in an adjacent band may mask a signal in the bandin question.

According to the masking model, any noise in this frequency band thathas an energy less than L_(min) will not be perceived by the listener.In particular, the noise introduced by replacing the measured signalamplitude in this band by a quantized approximation therefore will notbe perceived if the quantization error is less than L_(min). The noisein question will be less than L_(min) if the signal amplitude isquantized to accuracy equal to S/L_(min), where S is the energy of thesignal in the band in question.

The above-described quantization procedure requires a knowledge offrequency spectrum of the incoming audio signal at a resolution which issignificantly greater than that of the sub-analysis of the incomingsignal. In general, the minimum value of the mask function L will dependon the precise location of any peaks in the frequency spectrum of theaudio signal. The signal amplitude provided by the sub-band analysisfilter measures the average energy in the frequency band; however, itdoes not provide any information about the specific location of anyspectral peaks within the band.

Hence, a more detailed frequency analysis of the incoming audio signalis required. This can be accomplished by defining a time window abouteach filtered signal component and performing a frequency analysis ofthe audio samples in this window to generate an approximation to E(f).In prior art systems, the frequency analysis is typically performed bycalculating a FFT of the audio samples in the time window.

In one embodiment of a quantization sub-component according to thepresent invention, this is accomplished by further subdividing eachsub-band using another layer of filter banks. The output of each of thesub-band filters in the analysis filter bank is inputted to anothersub-band analysis filter which splits the original sub-band into aplurality of finer sub-bands. These finer sub-bands provide a moredetailed spectral measurement of the audio signal in the frequency bandin question, and hence, can be used to compute the overall mask functionL discussed above.

While a separate L_(min) value may be calculated for each filteredsignal value from each sub-band filter, the preferred embodiment of thepresent invention operates on blocks of filtered signal values. If aseparate quantization step size is used for each filtered value, thenthe step size would need to be communicated with each filtered value.The bits needed to specify the step size reduce the degree ofcompression. To reduce this “overhead”, a block of samples is quantizedusing the same step size. This approach reduces the number of overheadbits/sample, since the step size need only be communicated once. Theblocks of filtered samples utilized consist of a sequential set offiltered signal values from one of the sub-band filters. As noted above,these values can be inputted to a second sub-band analysis filter toobtain a fine spectral measurement of the energy in the sub-band.

One embodiment of such a system is shown in FIG. 9 at 400. The audiosignal values are input to a sub-band analysis filter 402 which issimilar to that shown in FIG. 5. The filtered outputs are quantized byquantizer 404 in blocks of 8 values. Each set of 8 values leavingsub-band analysis filter 402 is processed by a sub-band analysis filter408 to provide a finer spectral measurement of the audio signal. Subbandanalysis filters 408 divide each band into 8 uniform sub-bands. Theoutputs of sub-band analysis filters 408 are then used bypsycho-acoustic analyzer 406 to determine the masking thresholds foreach of the frequency components in the block. While the aboveembodiment splits each band into 8 sub-bands for the purpose ofmeasuring the energy spectrum, it will be apparent to those skilled inthe art that other numbers of sub-bands may be used. Furthermore, thenumber of sub-bands may be varied with the frequency band.

The manner in which an audio decompression system according to thepresent invention operates will now be explained with the aid of FIG. 10which is a block diagram of an audio decompression system 410 fordecompressing the compressed audio signals generated by a compressionsystem such as that shown in FIG. 9. The compressed signal is firstdecoded to recover the quantized signal values by a decoder 412. Thequantized signal values are then used to generate approximations to thefiltered signal values by de-quantizer 414. Since the present inventionutilizes multi-rate sampling, the number of filtered signal valuesdepends on the specific frequency bands. In the case in point, there are21 such bands. As discussed above, the five highest bands are sampled at8 times the rate of the lowest 8 frequency bands, and the intermediatefrequency bands are sampled at twice the rate of the lowest frequencybands. The filtered signal values are indicated by ^(k)S_(m), where mindicates the frequency band, and k indicates the number of the signalvalue relative to the lowest frequency bands, i.e., k runs from 1 to 8for the highest frequency bands, and 1 to 2 for the intermediatefrequency bands.

The filtered samples are inputted to an inverse sub-band filter 426which generates an approximation to the original audio signal from thefiltered signal values. Filter 402 shown in FIG. 9 and filter 426 form aperfect, or near perfect, reconstruction filter bank. Hence, if thefiltered samples had not been replaced by approximations thereto byquantizer 404, the decompressed signal generated by filter bank 426would exactly match the original audio signal input to filter 402 to aspecified precision.

Inverse sub-band filter bank 426 also comprises a tree-structured filterbank. To distinguish the filters used in the inverse sub-band filtersfrom those used in the sub-band filter banks which generated thefiltered audio samples, the inverse filter banks will be referred to assynthesizers. The filtered signal values enter the tree at the leafnodes thereof, and the reconstructed audio signal exits from the rootnode of the tree. The low and intermediate filtered samples pass throughtwo levels of synthesizers. The first level of synthesizers are shown at427 and 428. For each group of four filtered signal values accepted bysynthesizers 427 and 428, four sequential values which representfiltered signal values in a frequency band which is four times wider aregenerated. Similarly, for each group of eight filtered signal valuesaccepted by synthesizer 429, eight sequential values which representfiltered signal values in a frequency band which is eight times as wideare generated. Hence, the number of signal values entering synthesizer430 on each input is now the same even though the number of signalvalues provided by de-quantizer 414 for each frequency band varied fromband to band.

The synthesis of the audio signal from the sub-band components iscarried out by analogous operations. Given M sub-band components thatwere obtained from 2M polyphase components P_(i), the original polyphasecomponents can be obtained from the following matrix multiplication:

$\begin{matrix}{P_{i} = {\sum\limits_{k = 0}^{M - 1}\;{S_{k}{co}\;{s\left\lbrack \frac{\left( {i + \frac{M}{2}} \right)\left( {{2k} + 1} \right)\pi}{2M} \right\rbrack}}}} & \left( {5a} \right)\end{matrix}$As noted above, there are a number of different cosine modulations thatmay be used. Eq. (5a) corresponds to modulation using the relationshipshown in Eq. 3(a). If the modulation shown in Eq. 3(b) is utilized, thenthe polyphase components are obtained from the following matrixmultiplication:

$\begin{matrix}{P_{i} = {\sum\limits_{k = 0}^{M - 1}\;{S_{k}{co}\;{s\left\lbrack \frac{\left( {{2i} + 1 + M} \right)\left( {{2k} + 1} \right)\pi}{4M} \right\rbrack}}}} & \left( {5b} \right)\end{matrix}$The time domain samples x_(k) are computed from the polyphase componentsby the inverse of the windowing transform described above. A blockdiagram of a synthesizer according to the present invention is shown inFIG. 11 at 500. The M frequency components are first transformed intothe corresponding polyphase components by a matrix multiplication shownat 510. The resultant 2M polyphase components are then shifted into a 2Wentry shift register 512 and the oldest 2M values in the shift registerare shifted out and discarded. The contents in the shift register areinputted to array generator 513 which builds a W value array 514 byiterating the following loop 8 times: take the first M samples fromshift register 512, ignore the next 2M samples, then take the next Msamples. The contents of array 514 are then multiplied by W weightcoefficients, h′_(i) which are related to the h_(sub)i used in thecorresponding sub-band analysis filter to generate a set of weightedvalues _(wi)=h′_(i)*u_(i), which are stored in array 516. Here the u_(i)are the contents of array 514. The M time domain samples, x_(j) for j=0,. . . M−1, are then generated by summing circuit 518 which sums theappropriate w_(i) values, i.e.,

$\begin{matrix}{x_{j} = {\sum\limits_{i = 0}^{{W\text{/}M} - 1}\; w_{j + {Mi}}}} & (6)\end{matrix}$While the above-described embodiments of synthesizers and sub-bandanalysis filters are described in terms of special purpose hardware forcarrying out the various operations, it will be apparent to thoseskilled in the art that the entire operation may be carried out on ageneral purpose digital computer.

As pointed out above, it would be advantageous to provide a singlehigh-quality compressed audio signal that could be played back on avariety of playback platforms having varying computational capacities.Each such playback platform would reproduce the audio material at aquality consistent with the computational resources of the platform.

Furthermore, the quality of the playback should be capable of beingvaried in real time as the computational capability of the platformvaries. This last requirement is particularly important in playbacksystems comprising multi-tasking computers. In such systems, theavailable computational capacity for the audio material varies inresponse to the computational needs of tasks having equal or higherpriority. Prior art decompression systems due not provide thiscapability.

The present invention allows the quality of the playback to be varied inresponse to the computational capability of the playback platformwithout the use of multiple copies of the compressed material. Consideran audio signal that has been compressed using a sub-band analysisfilter bank in which the window contains W audio samples. Thecomputational workload required to decompress the audio signal isprimarily determined by the computations carried out by thesynthesizers. The computational workload inherent in a synthesizer is Wmultiplies and adds from the windowing operations and 2M² multiplies andadds from the matrix multiplication. The extent to which the filtersapproximate an ideal band pass filter, in general, depends on the numbersamples in the window, i.e., W. As the number of samples increases, thediscrepancy between the sub-band analysis filter performance and that ofan ideal band pass filter decreases. For example, a filter utilizing 128samples has a side lobe suppression in excess of 48 dB, while a filterutilizing 512 samples has a side lobe suppression in excess of 96 dB.Hence, synthesis quality can be traded for a reduction in computationalworkload if a smaller window is used for the synthesizers.

In the preferred embodiment of the present invention, the size of thewindow used to generate the sub-band analysis filters in the compressionsystem is chosen to provide filters having 96 dB rejection of signalenergy outside a filter band. This value is consistent with playback ona platform having 16 bit D/A converters. In the preferred embodiment ofthe present invention, this condition can be met by 512 samples. Theprototype filter coefficients, h_(i), viewed as a function of i have amore or less sine-shaped appearance with tails extending from a maximum.The tails provide the corrections which result in the 96 dB rejection.If the tails are truncated, the filter bands would have substantiallythe same bandwidths and center frequencies as those obtained from thenon-truncated coefficients. However, the rejection of signal energyoutside a specific filter's band would be less than the 96 dB discussedabove. As a result, a compression and decompression system based on thetruncated filter would show significantly more aliasing than thenon-truncated filter.

The present invention utilizes this observation to trade sound qualityfor a reduction in computational workload in the decompressionapparatus. In the preferred embodiment of the present invention, theaudio material is compressed using filters based on a non-truncatedprototype filter. When the available computational capacity of theplayback platform is insufficient to provide decompression usingsynthesis filters based on the non-truncated prototype filter,synthesizers based on the truncated filters are utilized. Truncating theprototype filter leads to synthesizers which have the same size windowas those based on the non-truncated prototype. However, many of thefilter coefficients used in the windowing operation are zero. Since theidentity of the coefficients which are now zero is known, themultiplications and additions involving these coefficients can beeliminated. It is the elimination of these operations that provides thereduced computational workload.

It should be noted that many playback platforms use D/A converters withless than 16 bits. In these cases, the full 96 dB rejection is beyondthe capability of the platform; hence, the system performance will notbe adversely effected by using the truncated filter. These platformsalso tend to be the less expensive computing systems, and hence, havelower computational capacity. Thus, the trade-off between computationalcapacity and audio quality is made at the filter level, and theresultant system provides an audio quality which is limited by its D/Aconverters rather than its computational capacity.

Another method for trading sound quality for a reduction incomputational workload is to eliminate the synthesis steps that involvespecific high-frequency components. If the sampled values in one or moreof the high-frequency bands are below some predetermined thresholdvalue, then the values can be replaced by zero. Since the specificcomponents for which the substitution is made are known, themultiplications and additions involving these components may beeliminated, thereby reducing the computational workload. The magnitudeof the distortion generated in the reconstructed audio signal will, ofcourse, depend on the extent of the error made in replacing the sampledvalues by zeros. If the original values were small, then the degradationwill be small. This is more often the case for the high-frequencyfiltered samples than for the low frequency filtered samples. Inaddition, the human auditory system is less sensitive at highfrequencies; hence, the distortion is less objectionable.

It should also be noted that the computational workload inherent indecompressing a particular piece of audio material varies during thematerial. For example, the high-frequency filtered sampled may only havea significant amplitude during pans of the sound track. When thehigh-frequency components are not present or sufficiently small to bereplaced by zeros without introducing noticeable distortions, thecomputational workload can be reduced by not performing thecorresponding multiplications and additions. When the high-frequencycomponents are large, e.g., during attacks, the computational workloadis much higher.

It should be noted that the computational work associated withgenerating the P_(k) values from the S_(i) values can be organized byS_(i). That is, the contribution to each P_(k) from a given S_(i) iscalculated, then the contribution to each P_(k) from S_(i+1), and so on.Since there are 2M P values involved with each value of S, the overheadinvolved in testing each value of S before proceeding with themultiplications and additions is small compared to the computationssaved if a particular S value is 0 or deemed to be negligible. In thepreferred embodiment of the present invention, the computationsassociated with S_(i) are skipped if the absolute value of S_(i) is lessthan some predetermined value, ε.

Because of the variation in workload, the preferred embodiment of thepresent invention utilizes a buffering system to reduce the requiredcomputational capacity from that needed to accommodate the peak workloadto that need to accommodate the average workload. In addition, thisbuffering facilitates the use of the above-described techniques fortrading off the required computational capacity against sound quality.For example, when the computational workload is determined to be greaterthan that available, the value of .epsilon. can be increased which, inturn, reduces the number of calculations needed to generate the P_(k)values.

A block diagram of an audio decompression system utilizing theabove-described variable computational load techniques is shown in FIG.12 at 600. The incoming compressed audio stream is decoded by decoder602 and de-quantizer 604 to generate sets of frequency components{S_(i)} which are used to reconstruct the time domain audio signalvalues. The output of synthesizer 606 is loaded into a FIFO buffer 608which feeds a set of D/A converters 610 at a constant rate determined byclock 609. The outputs of the D/A converters are used to drive speakers612. Buffer 608 generates a signal that indicates the number of timedomain samples stored therein. This signal is used by controller 614 toadjust the parameters that control the computational complexity of thesynthesis operations in synthesizer 606. When this number falls below apredetermined minimum value, the computational algorithm used bysynthesizer 606 is adjusted to reduce the computational complexity,thereby increasing the number of time domain samples generated per unittime. For example, controller 614 can increase the value of e describedabove. Alternatively, controller 614 could force all of thehigh-frequency components from bands having frequencies above somepredetermined frequency to be zero. In this case, controller 614 alsoinstructs de-quantizer 604 not to unpack the high-frequency componentsthat are not going to be used in the synthesis of the signal. Thisprovides additional computational savings. Finally, controller 614 couldchange the windowing algorithm, i.e., use a truncated prototype filter.

If the number of stored values exceeds a second predetermined value,controller 614 adjusts the computational algorithm to regain audioquality if synthesizer 606 is not currently running in a manner thatprovides the highest audio quality. In this case, controller 614reverses the approximations introduced into synthesizer 606 discussedabove.

While audio decompression system 600 has been discussed in terms ofindividual computational elements, it will be apparent to those skilledin the art that the functions of decoder 602, de-quantizer 604,synthesizer 606, buffer 608 and controller 614 can be implemented on ageneral purpose digital computer. In this case, the functions providedby clock 609 may be provided by the computer's clock circuitry.

In stereophonic decompression systems having parallel computationalcapacity, two synthesizers may be utilized. A stereophonic decompressionsystem according to the present invention is shown in FIG. 13 at 700.The incoming compressed audio signal is decoded by a decoder 702 andde-quantized by de-quantizer 704 which generates two sets of frequencycomponents 705 and 706. Set 705 is used to regenerate the time domainsignal for the left channel with the aid of synthesizer 708, and set 706is used to generate the time domain signal for the right channel withthe aid of synthesizer 709. The outputs of the synthesizers are storedin buffers 710 and 712 which feed time domain audio samples at regularintervals to D/A converters 714 and 715, respectively. The timing of thesignal feed is determined by clock 720. The operation of decompressionsystem 700 is controlled by a controller 713 which operates in a manneranalogous to controller 614 described above.

If a stereophonic decompression system does not have parallelcomputational capacity, then the regeneration of the left and rightaudio channels must be carried out by time-sharing a single synthesizer.When the computational workload exceeds the capacity of thedecompression system, the trade-offs discussed above may be utilized totrade audio quality for a reduction in the computational workload. Inaddition, the computational workload may be reduced by switching to amonaural reproduction mode, thereby reducing the computational workloadimposed by the synthesis operations by a factor of two.

A stereophonic decompression system using this type of serialcomputation system is shown in FIG. 14 at 800. The incoming compressedaudio signal is decoded by a decoder 802 and de-quantized byde-quantizer 804 which generates sets of frequency components for use insynthesizing the left and right audio signals. When there is sufficientcomputational capacity available to synthesize both left and rightchannels, controller 813 time shares synthesizer 806 with the aid ofswitches 805 and 806. When there is insufficient computational capacity,controller 813 causes switch 805 to construct a single set of frequencycomponents by averaging the corresponding frequency components in theleft and right channels. The resulting set of frequency components isthen used to synthesize a single set of monaural time domain sampleswhich is stored in buffers 810 and 812.

The techniques described above for varying the computational complexityrequired to synthesize a signal may also be applied to vary thecomputational complexity required to analyze a signal. This isparticularly important in situations in which the audio signal must becompressed in real time prior to being distributed through acommunication link having a capacity which is less than that needed totransmit the uncompressed audio signal. If a computational platformhaving sufficient capacity to compress the audio signal at full audioquality is available, the methods discussed above can be utilized.

However, there are situations in which the computational capacity of thecompression platform may be limited. This can occur when thecomputational platform has insufficient computing power, or in cases inwhich the platform performing the compression may also include a generalpurpose computer that is time-sharing its capacity among a plurality oftasks. In the later case, the ability to trade-off computationalworkload against audio quality is particularly important.

A block diagram of an audio compression apparatus 850 utilizing variablecomputational complexity is shown in FIG. 15 at 850. Compressionapparatus 850 must provide a compressed signal to a communication link.For the purposes of this discussion, it will be assumed that thecommunication link requires a predetermined amount of data forregenerating the audio signal at the other end of the communicationlink. Incoming audio signal values from an audio source such asmicrophone 852 are digitized and stored in buffer 854. In the case ofstereophonic systems, a second audio stream is provided by microphone851. To simplify the following discussion, it will be assumed thatapparatus 850 is operating in a monaural mode unless otherwiseindicated. In this case, only one of the microphones provides signalvalues.

When M such signal values have been received, sub-band analysis filterbank 856 generates M signal components from these samples while the nextM audio samples are being received. The signal components are thenquantized and coded by quantizer 858 and stored in an output buffer 860.The compressed audio signal data is then transmitted to thecommunication link at a regular rate that is determined by clock 862 andcontroller 864.

Consider the case in which sub-band analysis filter 856 utilizes acomputational platform that is shared with other applications running onthe platform. When the computational capacity is restricted, sub-bandanalysis filter bank 856 will not be able to process incoming signalvalues at the same rate at which said signal values are received. As aresult, the number of signal values stored in buffer 854 will increase.Controller 864 periodically senses the number of values stored in buffer854. If the number of values exceeds a predetermined number, controller864 alters the operations of sub-band analysis filter bank 856 in amanner that decreases the computational workload of the analysisprocess. The audio signal synthesized from the resulting compressedaudio signal will be of lesser quality than the original audio signal;however, compression apparatus 850 will be able to keep up with theincoming data rate. When controller 864 senses that the number ofsamples in buffer 854 returns to a safe operating level, it alters theoperation of sub-band analysis filter bank 856 in such a manner that thecomputational workload and audio quality increases.

Many of the techniques described above may be used to vary thecomputational workload of the sub-band analysis filter. First, theprototype filter may be replaced by a shorter filter or a truncatedfilter thereby reducing the computational workload of the windowingoperation. Second, the higher frequency signal components can bereplaced by zero's. This has the effect of reducing “M” and therebyreducing the computational workload.

Third, in stereophonic systems, the audio signals from each of themicrophones 851 and 852 can be combined by circuitry in buffer 854 toform a monaural signal which is analyzed. The compressed monaural signalis then used for both the left and right channel signals.

ForHowever, for the purposes of the present discussion, it is sufficientto note that the filters may be implemented as finite impulse responsefilters with real filter coefficients. If the synthesis filter generatesM coefficients per frame representing the amplitude of the transmittedsignal, the filter bank accepts M frequency-domain symbols and generatesM time-domain coefficients. However, it should be noted that the Mcoefficients generated may also depend on symbols received prior to theM frequency-domain symbols of the current frame. Similarly, the analysisfilter bank demodulates M frequency-domain symbols from M time-domainreceived signal values in a given frame, and the resulting M symbols maydepend on previous frames of M time-domain signal values processed bythe filter bank.

The communication bandwidth may alternatively be broken up into subbandsof distinct (nonuniform) bandwidths by means of a single nonuniformfilter bank transform. The synthesis filter bank, orfrequency-domain-to-time-domain transform for converting symbols intosignal values for transmission, is depicted in FIG. 4 16 at 300 for asystem having K subchannels. If the subchannels are nonuniform in theirbandwidth, distinct subchannels of the filter bank will operate atdifferent upsampling rates, the upsampling rate of the k^(th) subchannelwill be denoted by M_(k). The upsampling rates are subject to thecritical sampling condition

$\begin{matrix}{{\sum\limits_{k = 0}^{K - 1}\frac{1}{M_{k}}} = 1} & {\left\lbrack (1) \right\rbrack(7)}\end{matrix}$

Referring to FIG. 4 16, synthesis filter bank 300 generates M_(tot)time-domain samples in each time frame. Here, M_(tot) is the leastcommon multilple of the upsampling rates M_(k) provided by theupsamplers of which 302 is typical. Define the integers n_(k) by

$\begin{matrix}{n_{k} = {\frac{M_{tot}}{K_{k}}.}} & {\left\lbrack (2) \right\rbrack(8)}\end{matrix}$

In each frame of transform processing, n_(k) symbols, denoted bys_(k,i), are mapped onto the k^(th) subchannel using the sequence,f_(k), as the modulating waveform to generate a time domain sequence,x_(k), representing the symbols in the k^(th) subchannel, i.e.,

$\begin{matrix}{{x_{k}\lbrack n\rbrack} = {\sum\limits_{i}\;{s_{k,i}{f_{k}\left\lbrack {n - {iM}_{k}} \right\rbrack}}}} & {\left\lbrack (3) \right\rbrack(9)}\end{matrix}$

Note that symbols from previous frames may contribute to the output of agiven frame. Each of the contributions x_(k) from the K distinctsubchannels are added together, as shown at 301, to produce a set ofM_(tot) time-domain signal values x[n] from M_(tot) input symbolsS_(k,i) during the given frame. The k^(th) subchannel will have abandwidth that is 1/M_(k) as large as that occupied by the fulltransmitted signal.

At the receiver, the incoming discrete signal values x′[n] are passedthrough an analysis filter bank 400, depicted in FIG. 5 17. The receivedsignal values are denoted by x′ to emphasize that the samples have beenaltered by the transmission link. Each filter in this bank has acharacteristic downsampling ratio M_(k) imposed after filtering by anfinite impulse response filter, producing a set of M_(tot) outputsymbols s per frame. A typical filter is shown at 401 with itscorresponding downsampler at 402. The output symbol stream for thek^(th) subchannel is given by

$\begin{matrix}{s_{k,n}^{\prime} = {\sum\limits_{i}\;{{\text{x}^{\prime}\left\lbrack {i - {nM}_{k}} \right\rbrack}*{H_{k}\lbrack i\rbrack}}}} & {\left\lbrack (4) \right\rbrack(10)}\end{matrix}$

Again, input signal values from preceding frames may contribute to theset of symbols output during a given frame.

We require that in an ideal channel, the subchannel waveforms, f_(k,)together with the receive filters H_(k) satisfy perfect-reconstructionor near-perfect-reconstruction conditions, with an output symbol streamthat is identical (except for a possible delay of an integer number ofsamples) to the input symbol stream. This is equivalent to the absenceof inter-symbol and inter-channel interference upon reconstruction.Methods for the design of such finite-impulse-response filter bankwaveforms are known to the art. The reader is referred to J. Li, T. Q.Nguyen, S. Tantaratana, “A simple design method for nonuniform multiratefilter banks,” in Proc. Asilomar Conf. On Signals, Systems, andComputers, November 1994 for a detailed discussion of such filter banks.

Various modifications to the present invention will become apparent tothose skilled in the art from the foregoing description and accompanyingdrawings. Accordingly, the present invention is to be limited solely bythe scope of the following claims.

1. A communication system for sending a sequence of symbols on acommunication link a sequence of symbols having values representative ofsaid symbols, said communication system comprising a transmitter forplacing information indicative of said sequence of symbols on saidcommunication link and a receiver for receiving said information placedon said communication link by said transmitter, said transmittercomprising a clock for defining successive frames, each said framecomprising M time intervals, where M is an integer greater than 1; amodulator modulating each of M carrier signals with a signal related tothe value of one of said symbols thereby generating a modulated carriersignal corresponding to each of said carrier signals that is to bemodulated and generating a sum signal comprising a sum of said modulatedcarrier signals, said modulator comprising a tree-structured array offilter banks having nodes, including a root node and M leaf nodes, eachof said values related to said symbols forming an input to acorresponding one of said leaf nodes, each of said nodes, other thansaid leaf nodes, comprising one of said filter banks; and an outputcircuit for transmitting said sum signal on said communication link,wherein said carrier signals comprise first and second carriers, saidfirst carrier having a different bandwidth than said second carrier. 2.The communication system of claim 1 wherein said receiver comprises: aninput circuit for receiving and storing M time-domain samplestransmitted on said communication link; and a decoder for recoveringsaid M symbol values, said decoder comprising a tree-structured array ofsub-band filter banks, said received M time-domain samples forming theinput of a root node of said tree-structured array of said decoder andsaid M symbol values being generated by the leaf nodes of saidtree-structured array of said decoder, each said sub-band filter bankcomprising a plurality of FIR filters having a common input forreceiving an input time-domain signal, each said filter generating anoutput signal representing a symbol value in a corresponding frequencyband.
 3. A communication system for sending a sequence of symbols on acommunication link, said communication system comprising a transmitterfor placing information indicative of said sequence of symbols on saidcommunication link, said transmitter comprising: a clock for definingsuccessive frames, each said frame comprising M time intervals, where Mis an integer greater than 1; a modulator modulating each of M carriersignals with a signal related to the value of one of said symbolsthereby generating a modulated carrier signal corresponding to each ofsaid carrier signals that is to be modulated and generating a sum signalcomprising a sum of said modulated carrier signals; an output circuittransmitting said sum signal on said communication link, wherein saidcarrier signals comprise first and second carriers, said first carrierhaving a different bandwidth than said second carrier; and a receivercomprising: an input circuit for receiving and storing M time-domainsamples transmitted on said communication link; and a decoder forrecovering said M symbol values, said decoder comprising atree-structured array of sub-band filter banks, said received Mtime-domain samples forming the input of a root node of saidtree-structured array said decorder and said M symbol values beinggenerated by the leaf nodes of said tree-structured array decorder, eachsaid sub-band filter bank comprising a plurality of FIR filters having acommon input for receiving an input time-domain signal, each said filtergenerating an output signal representing a symbol value in acorresponding frequency band.
 4. The communication system of claim 3wherein said modulator comprises a tree-structured array of filter bankshaving nodes, including a root node and M leaf nodes, each of saidvalues related to said symbols forming an input to a corresponding oneof said leaf nodes, each of said nodes, other than said leaf nodes,comprising one of said filter banks.
 5. A stereophonic audio signaldecompression method comprising: decoding, using a decoder, a compressedstereophonic audio signal; de-quantizing, using a de-quantizer, thecompressed stereophonic audio signal to generate sets of frequencycomponents for synthesizing left and right audio signals; switching,using a controller, to constructing a single set of frequency componentsby averaging corresponding frequency components in the left and rightaudio signals when a computational workload exceeds a capacity of adecompression system; and synthesizing, using a synthesizer, a monauralaudio time domain signal.